NCAR Climate and Global Dynamics Laboratory 2020 CGD Seminar Series, 2 Mar 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CGD Seminar Series - Claire Monteleoni

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

But as katie mentioned, um I've been working on climate change due to the climate informatics through the due to the threat of climate change and the extreme storms and um natural disasters that we've been seeing and their threats to communities and ecosystems, and this line of research is based on a vision that machine learning can help shed light on climate change um and so yeah. Just for folks.

A

That want to be aware of this critical mass really bringing together all areas of data, science, data, mining and ai, with all areas of climate science and actually more broad areas in addressing climate change for those that want to get involved, we're doing a program at the kavli institute in santa barbara starting in november of this year on the topic and then, of course, there will be more climate, informatics, hackathons and conferences coming up.

A

So please get involved um and trying to argue to people in the ai and machine learning field about six years ago. The way we sort of broke down where we thought machine learning might help and was already helping, was questions of paleoclimate reconstruction right in some sense. For us, this is a very large, sparse matrix in space and time where you have some paleo proxy data that goes back really far in time, but only in a few sparse set of spatial locations.

A

There's downscaling, which I'll talk about today and my group has also done a lot of work on saying. Well, look you have these physics-driven climate models now you have ensembles of them from different labs, and you do also have historical observation.

A

Can we use ai to try to robustify predictions and create predictions that that learn over time and space to do better than just say that ensemble mean that's, been a huge thrust in our lab?

A

More generally speaking, as I'm sure all of you know, because you're intimate with a lot of the data and the modeling is that um it's a huge playground around spatiotemporal data and um the sort of non-linear dependencies that we might have over both time and space.

A

um My group is very interested in that and then um recently, we've been working on on some extreme events, um so in green are sort of what my lab has been focusing on.

A

But again, this was six years ago when we were trying to carve out the space um for machine learning, um but we've we've also seen a bunch of other applications um and, as we get more broadly towards climate change mitigation, there's things like collaborating with nrel to try to robustify forecasts of solar output on the order of minutes is something that we started to do as well.

A

But, generally speaking, you know, I'm sure you guys have problems that aren't on this slide, where I believe machine learning could probably have an impact, um and so I look forward to the discussion.

A

um But you know, given years of sort of working in this area, I figured what would might be most valuable here is to focus on a set of methods um and show some case studies.

A

um So I think I only reliably have time for one case study, and so I'm gonna do it on downscaling, because that's relatively general, the example um of downscaling we've shown in both temperature and precipitation um if time, I'll I'll talk about an avalanche detection task, which is an instance of anomaly detection, meaning there's severe class imbalance. So this is a rare event and in in our setting we had a ground survey, but only a very limited amount of the survey data, with, of course, plenty of satellite data.

A

um I'm going to talk about unsupervised, learning, unsupervised, deep learning and along the way, we'll talk about um self-supervised and if time semi-supervised.

A

So I generally kind of start like to start with um an idea to hold or sort of like a punch line, but this is just jumping into the details and for people who for whom this doesn't make sense. Don't worry we'll go through the case studies, but if you are already familiar about what deep learning is basically doing, I want to make the jump from supervised to unsupervised.

A

So um I'd like to just boil down my entire network of weights and activations to this w, representing all the parameters that are going to be learned from data and the way a neural network is trained. Is that you apply your whole network f w to an input example say an image or like a vector or tensor of data and make some output. It could be a scalar or a vector we'll call that y-hat, but training in the machine. Learning context is trying to do some sort of gradient descent on a loss function.

A

This is um sort of penalizing how well your network output y hat approximates the correct label for your input x. So if you know what the ground truth target should be for input x, we would call that little y and then given many input output pairs. Where y is the desired output, we can train.

A

We can fit the parameters w. So how do we do that? We're going to do a stochastic gradient descent on the loss function? Remember! The loss function is comparing the network's output to the target label, the ground truth desired output on input x and then just via the chain rule for taking derivatives, we'll get what the incremental updates should be to every single network weight, no matter the architecture.

A

um So that's when you have a label and I'm trying to argue that this is not too dissimilar um from the case where you don't have a label and if you don't even know what label means, um because it's some jargon you know you might just be in the case where you don't really have a label, meaning you don't have enough ground truth examples of sort of input, output pairs, um but that's okay.

A

um So you have data and we'll still have our network parametrized by all the parameters and kind of architecture, choices that I'm indicating by f of w um and so now I'll just call the network output x hat because there's no independent or there's no label y. um So that means we can write down a loss function or some objective function that you're trying to minimize via training but- and that's perfectly fine just that now that loss function can only depend on the network's input and the network's output. There's no external information.

A

Why? um But assuming you write down such a loss function, then everything else follows similarly you're doing gradient. Descent to you know to push the output of the network more similar to be more similar to the input in some way or you've defined a loss.

A

Here you do gradient descent on that and that'll tell you how to update all the network parameters, so this should hopefully be extraordinarily freeing, because, usually the bottleneck is how much data you have where you know what the ground truth um label, like classification, for example, should be, and so this should hopefully allow you to think more creatively about how machine learning can be applied um in your in your science.

A

um One form that I'm not going to talk about today of of loss function relates to clustering, so objectives for doing exploratory data analysis, hierarchical clustering, k, center clustering, a lot of spectral clustering. All of these have mathematical objectives that you can write down. That only depend on the input data so as we consider either an entire training data set or many batches.

A

We can write down clustering, objective functions and then use our network actually to learn good clusterings.

A

So that's one class of unsupervised, objective function and then the other class is really trying to learn some kind of representation of your data distribution.

A

Usually we would think of it as a compact representation, so we can think of it as automatic feature extraction, and so we can think of this machine which is sort of a useless stupid machine that we don't think we would really need, because it's almost doing this noisy version of an identity function, we're going to give it input here. It's gray, scale images, so each pixel lives in grayscale and the images were generated by humans, like writing, addresses at the post office and so we're trying to do digit recognition.

A

But actually here the task is just to reconstruct an image, that's similar to the input- and you know who needs a machine for that.

A

But the training is that loss that compares the input of the network to the output of the network, as we showed on the previous slide and tries to push them close together, and we do gradient descent on that loss to learn the weights in these blocks. You don't really have to know. What's going on other than that, there are parameters that we fit here um and so at the end, we we get this stupid machine that tries to make an input like an output and even has some loss or error.

A

But this is what's called a pretext task so that we've learned this kind of noisy identity function for as a pretext to instead look at what we got in the bottleneck is that if we've sent um the data in in training through a lower dimension, then we might get some sort of compact representation now of the whole data distribution that we trained on.

A

um That's that's the high level idea, and this is the idea behind an auto encoder, and so typically, if your input is at some dimension, which is also the dimension of your output, when you're trying to generate images- or you know, data similar to the input- then we might think of this latent representation as a compact representation of the data.

A

Again, these are all the network parameters that we are going to estimate while trying to minimize the reconstruction error between output and input- and I would say, for the most part in climate and other applications that you guys care about. You would be looking for a compact representation because that's you know some way to summarize your data or to extract features automatically, but I will just mention- and this is this is sort of curious, um but cognitive, scientists or vision.

A

Scientists believe that in human cognition, we're doing something similar, but we're sending the data through a wide layer that is actually higher dimensional than the input and output dimension and actually from neuroscience. There's evidence of this in in other systems like fruit flies olfactory systems, so they kind of map the smells that they receive as input into a higher dimensional space than the sensory input space and the point of doing that you're, certainly less compact in dimension than the input. But within that over complete representation.

A

The goal is that the representations will then be quite sparse, um which helps you to distinguish different. um You know, sense tokens, so that's in cognitive, science. So vision people might use wide or over complete, auto encoders.

A

We sometimes use them in machine learning, but I think, probably in in the applications that you care about you'll want to be doing what's called dents, where you send it through a bottleneck, and so it's all well and good after training on a bunch of input, images to have some compact feature representation, but really we've trained on a distribution, so it'd be nice to also have a distribution over the latent space.

A

So the first idea of a variational autoencoder was to fit a gaussian distribution, so you'll start with some simple prior, um but then you could actually learn a full covariance structure.

A

um Well here this is actually spherical, but you could learn a full gaussian distribution over latent representations and um and then shortly I'll talk about the next step away from this, where it doesn't even have to be gaussian um so anyway, that's sort of basic um things that I'm gonna touch on in unsupervised. Learning, um as I mentioned, my group is getting really excited around unsupervised learning and has done um some work on avalanche detection. I probably won't have time to get to that.

A

Maybe at the end, I'm mostly going to talk about a downscaling task, and here it's an unsupervised method, whereas the previous one was semi-supervised. It was sort of an unsupervised plus supervised pipeline for this one, it's unsupervised, but I'm going to talk about self-supervision briefly, as my interpretation of why this actually works so well in practice.

A

um So brian, who I actually think was a a summer intern at some lab in encar a couple of years back did a master's here in computer science and then um graduated from his master's thesis and went on to a climate science lab in potsdam like albert wagner or something but anyway um this this. This first project, I'm going to talk about, was his master's thesis.

A

So um you know for this audience. I don't need to distinguish too much what I mean about downskilling. I will warn you, though, as when you, when you go out to get software packages from machine learning and computer vision about the terminology, so sometimes we'll talk about up sampling or super resolution, meaning actually downscaling so up and down get reversed. So, let's be very clear: we want to use coarse scale, space, geotemporal data fields to infer values at finer scales.

A

um Actually, ultimately, the method that we provide is symmetric, so you can also go from fine to core scale if that would be needed um and, of course, there's a whole field on this. um From our perspective, thinking phrasing this using you know, machine learning jargon, is that the methods that we saw were supervised learning methods.

A

What we mean by that is to train these statistical downscaling methods that we were able to find in the literature.

A

You would need um to see the field at a course scale, and the corresponding field at fine scale for and you'd have need to have many of these paired instances where you get both coarse-scaled and fine-scale data paired in order to train your method, um so we'd like to move away from that first off. Secondly,.

A

They would tend to provide point predictions, meaning once trained. You would give such a method a you know, a map of your variable at the course scale, and it would output one instantiation at fine scale when we would actually prefer and we're calling that a point prediction, even though it's the whole map at finer scale. um We would actually prefer a distribution over such maps at fine scale and so we're going to call that generative downscaling, so brian's approach to generative downscaling was using a sort of problem.

A

Definition in machine learning called domain alignment as well as some very recent approaches, addressing domain alignment and deep unsupervised learning so self-supervision is um when there is some kind of usually temporal or spatial uh underlying structure that connects your data, so that you don't need external labels. So my my interpretation of why this technique actually did work so well for generative downscaling is while we didn't need to train on any paired images.

A

um You know there was a a calibrate, a spatial calibration between the coarse and fine scaled image, distributions that we trained from, and so there was some underlying shared uh geographical structure that allowed the methods to succeed.

A

um Okay, so not only uh can you sort of map between data at different um spatial scales, um so in this case one degree, lat lawn box resolutions versus one-eighth degree, but also you know these data sets could be quite different. So we have re-analysis data from era right.

A

So that's generally thought of as coming from observations, even though it's smoothed a bit through models and then nwp data, which is of course from wharf, which is you know, simulated and based on physics, um and so we did two separate experiments or two separate studies, one for temperature and one for precip. So, on the left hand side, you have the coarse scale resolution for temperature and per sip um and then on the right hand, side.

A

You have the finer scale, but again one is from era reanalysis and the other is the output of warf okay. um So this domain alignment task is saying that I have two random variables and what I'd? Like to learn is a bijection so that if I have um samples from the marginal of x- and I apply, this function f I'll approximate, the marginal of y for two random variables.

A

And similarly, if I have iid samples from the marginal of y- and I apply f inverse- I can approximate the marginal of x um so to clarify these x's and y's are not I'm supposed to be example and label. These are just two different random variables and for our purposes you know one is going to be the course scaled distribution and one is going to be the fine grained scaled distribution and we don't need any pairings between x and y.

A

We just need to assume that we have iid access to the marginal of x to samples from the marginal of x and similarly for the marginal of y.

A

um So um the the domain alignment is going to end up being symmetric, so we could go to course resolution to from course resolution to fine or vice versa, so it doesn't really matter which one I call x and which one I call y, but the way that it works.

A

So this latent space that um before I was showing as a distribution over those bottleneck representations, um but it's just some space that I'm going to learn, um starting with some prior over that space, which could be very simple like an isotropic gaussian, but since ultimately, I'd like to have a joint distribution, a joint probability density function over the data at the two different resolutions, I'm going to add an assumption that those two domains x and y, of course, and fine resolution- are conditionally independent, given this latent space z.

A

So when I've added that assumption, then of course that allows me to factor the full joint between um high resolution. You know coarse resolution, fine resolution and the latent space, um which then, of course, would allow me to represent my joint um and so then. Ultimately, I could sample conditionally on a coarse grained resolution. I could sample fine grained resolutions or vice versa, or I could take unconditional samples of the informative um posterior. I've learned over the latent space and then sample at either lower high resolution.

A

um So there is that assumption of conditional independence over a shared latent space and everything works. Subject to that assumption, um then learning um the the conditional distribution of one space given the latent space and similarly to the other, we're going to use a technique that came out um last year in triple ai um and um in terms of getting something more informative, a distribution to be more informative over the latent space.

A

I'm going to talk on the next slide about normalizing flows, but the punch line here again is that you might want to do downscaling in your application, but not have access to paired images that really um correspond between the course resolution and the fine resolution, whereas you may be able to get samples at either resolution as much as you want, and so here the pairing between those those maps or fields is not required.

A

um So to pick up with where we were with variational, auto encoders, where you could maybe learn a gaussian over the latent space, you might want to start with something simple like gaussian or uniform and then get something much more informative.

A

That is fit by training on your your data distribution, and so you can do so by um by composing um invertible transformations where um you'll learn the parameters of each of these and that's called a flow, um and so the the punch line here we're actually not going to use any of the flows on this slide, we're going to use an algorithm called glow, but we're going to learn mappings that can be much more informative than just a gaussian over our latent space z.

A

So one invertible mapping from one domain say the course resolution domain and then another one um mapping between z and the the fine resolution domain, um and so this architecture largely follows this paper a line flow um but we're using a normalizing flow called glow, which is one by one invertible convolutions.

A

um Where is the learning happening? The learning is only happening, so parameters are only getting fit in this glow step, which are the parameters that will instantiate that composed and invertible mapping between um one data space and this latent space and between the other data space and the shared latent space. So we can have a different parameter set for each of those invertible mappings and then that of course yields a mapping between the two different data spaces.

A

So this latent dimension is neither wider nor com compact. It actually matches the dimension of of our fine resolution data and so we'll store our course resolution data at the same dimension and so to get it up sampled. We simply just look at neighboring cells to up sample it and provide the minimum additional information, okay, so more information on the machine learning architecture and then the normalizing flow or in these two papers.

A

So this was pretty new in statistical downscaling, um and so we didn't have a way to compare numerically to another generative downscaler. This is the only one that we know of so. Instead we compared to point predictions, which means the comparisons had to be run using paired images. The tests were run on images paired at the high res and lo-res, um and so we could compare to bcsd, which hopefully you guys know what it is.

A

It was sort of the state of the art at the time in downscaling, along with a climate informatics 2019 paper by banyo medina, that used a convolutional neural network, so it was still supervised in that you needed paired images, so we can do kind of um tests on a held out uh data set um that you know. None of the methods were trained on of paired um images, and you know our technique didn't do much worse than either of them.

A

We wouldn't expect it to do better, because the unsupervised task is generally harder, but it's pretty comparable um and then we don't really have a quantitative way of showing it, but where this shines right is that it can be uh oh and we did a whole bunch of climb decks indices. This is this was just um one of the tables in brian's thesis, so I would refer you to that or to the archive or the climate informatics proceedings.

A

um But how do these predictions work? um The the um the distributional aspect of the model, so we could input a course resolution image um to the model and for reference, we're testing this on a reference set where we have a paired image, and so this happened to be the paired image from wharf at the fine grained resolution. This is not given to the model and then on the right in in shaded, we get the predicted image out of the model, but actually conditioned on this input image.

A

We could also just sample a whole bunch of times and get kind of a distribution over um fine-grained predictions.

A

The other cool thing which is lifting from this recent idea in machine learning, is that of interpolation within the latent space. um So I said that this technique is symmetric in that you could either upscale or down scale. So in this case we start from some nwp data, so some worf images, the one at the far left top and far right top are in our data set and everything else is generated by the model.

A

So how does this work? Well, we input this image map it to the latent space and then we can output um from that late that point. In latent space we can output a core screen image. Similarly, with this endpoint, then we can also take a walk in latent space and now there's a whole literature and machine learning on how to do this, so how to interpolate in your um along the distribution that you've learned over latent space.

A

There are techniques to approximate the geodesic with respect to the the posterior distribution that you've now learned, but so now these each are points in latent space that we can use to generate an image at fine scale and an image at coarse scale.

A

um And so you know these two wharf images were, you know, simulating precipitation at different times, so we could actually view this. As some you know: ai conjectured, temporal interpolation of my precipitation map at fine scale and at coarse scale.

A

So that's interesting and you guys can tell us how interesting it is, um but it's something that the model can do um and and so, as I mentioned, um you know, when you input this sample, then you're getting a conditional sample at the core scale. You could also sample unconditionally, so you could just sample from your distribution over the latent space and then generate um the the maps at higher low res from those samples.

A

A

So um I think what I'll do is um so I had kind of two possible places to end, depending on how much discussion we wanted to have so I'll, say some kind of conclusory and outlook type statements and then see how much people want to discuss versus hear about the avalanche project.

A

But we are getting interested in extremes. We've done work unsupervised work just simply on how do you represent multivariate extreme events? How do you learn them from data in an unsupervised way in a soft probabilistic way, where you're looking at multiple variables time simultaneously, so different, relative and specific humidities and temperatures and pressures and your method can detect multiple event types unlike something like the palmer drought severity index? That's really only looking for one event: type: drought.

A

We wanted a much more flexible method that could detect a bunch of different types of events.

A

um We've done some work on hurricane track forecasting, where we would look at track data track history and then also spatiotemporal fields around like the storm center from the track, um and we were just looking at temperature and pressure for some of these studies um and got actually comparable performance to state of art um at the time at the national hurricane center um avalanche detection, which, if I have time, I can discuss a little more, both with supervised learning and then with an auto encoder and then a semi-supervised pipeline and then other work around extremes, both dry and wet extremes from precipitation in monsoon, um and I was also recently asked- and this might be kind of a good conversation, starter or outlook slide.

A

um What I see as sort of bottlenecks or challenges um in climate science. Now you know some of the challenges.

A

One way to phrase challenges right is the scientific questions that I mentioned before, like how do we reconstruct past climates?

A

um You know the paleo problem or other scientific problems, or even just the problem of downscaling, but when we go to study any of these problems, there are bottlenecks or limitations that we face as machine learners, um so data limitations. um So you know, turning towards unsupervised.

A

Learning and dimensionality reduction is in large part that the amount of labeled data, so data, where it's paired with some ground truth like target that you would like your model to be able to predict, is very limited compared typically compared to the dimension of the data and that trade-off is important in machine learning. So if you have high dimensional data, you'll need more labeled data to do supervised learning. So that's why we're turning to unsupervised techniques, um class imbalance?

A

You know, that's why we're um looking more at anomaly, detection type techniques, um and so some of.

B

A

We've we've addressed um or were addressing using those first two bullets, but um one thought I've been having is that, just in general, we're in the measurement era now, but that only goes back so far in time, not not very long in time, um and we don't have a counter factual world right.

A

So we've only observed um a time series once per location, but can we actually substitute the diversity and granularity that we might need in our data to train machine learning from the temporal regime regime where we're kind of poor in data to the spatial regime? Where we're very data rich these days?

A

um So there's issues of skill resolution. um I just talked about downscaling um and there's no reason this can't apply to other spatiotemporal fields that you're interested in. But I know that modelers um are hoping that machine learning can weigh in on some of these parameterization questions. Where you know, moist processes can't be directly modeled, and I don't have too much to say about that. I think you know people at your own institution, like um john gagne and others um are looking into that.

A

Non-Stationarity has been the main one of the main focuses in my.

B

A

Around climate informatics so um around improving the ensemble prediction, making it adaptive to both space and time and using algorithms that learn actually the level of non-stationarity in both space and time, while making these uh um adaptive, ensembles and and then of course, interpretability.

A

um So I think interpretability has informed all our work, but there's really um a long ways to go with that, and there are people. Now I mean there's this new ai institute on trustworthy machine learning for weather and climate um m.a ebert upoff, who you may know, is involved with that, and it's led by amy, mcgovern um and so they're directly trying to say how do we make an ai driven model or forecast that can be directly interpretable um by humans and communities etc?

A

So um yeah? So I had kind of like a stopping point here for discussions and then, if, if the discussions lead there I'd be happy to talk about the avalanche work. But if not, you know, I'm also fine.

B

Okay, great thanks claire yeah.

B

This is all great so far, um if folks want to weigh in in the chat, um if you want to hear about the avalanche work or if you have questions, we can just jump right into questions from the audience, so you can type those in the chat or you can use the raise hand feature which is in the reactions pane at the bottom of the zoom screen, and then ask your question out loud and feel free to also turn your cameras on um if you'd like, so we can make it a little bit more interactive.

B

um So while we wait for some some questions, I guess claire coming off of your last point there on interpretability. Do you have any um interpretability examples that you can talk about for your work, especially you know? Maybe some of the downscaling work or other projects that your group is working on.

A

Let's see, I did have something to say about that. I just want to um be able to see my screen again.

A

um Okay, so you can still see my shared screen, but I just want to check. I had some thoughts about that recently.

B

Yeah, we can see it, it's just not it's not full screen, but we can see.

A

Yeah, because I'm I'm secretly looking at something else, um what can I say about interpretability.

A

Oh okay, so one um one thing, um one way that generative learning has been used um and actually more so generative adversarial networks is to generate. um You know possible instantiations of things and I've seen this both in your community for um precipitation maps and clouds are two examples where I've seen people give talks at climate informatics events- um and I I love these- these images that they generate and these clouds that they generate, um but in at least several of these talks, I've seen so I guess we can go back to the interpretability slide.

A

um Oh no! It's here, um I've seen a bullet at the end. That said, okay, so now that I've trained my gan to generate images of clouds. What is one of my future work bullet points for someone giving this talk might be well.

A

How do I evaluate um if the distribution over the cloud images that I've generated is good? Like does it match the distribution over cloud images in nature um and I've sort of chuckled, because this is actually really an area of active research in ai itself in interpretable ai, so for gans in general, whether or not you're applying to them to clouds? If you're applying them to celebrity faces, or what have you now, you've got this generative model that can give you you know distributions over images.

A

How do you characterize or evaluate those distributions and compare them to the true distribution?

A

And it's really almost philosophical and certainly non-trivial, because if you had a formal description of the true distribution like a generative model for the true distribution, you'd be done, you wouldn't even be learning again, um and so so in my group on the theoretical side, we're we're looking um at um you know, formal ways to evaluate generative models such as as gans uh no interesting results to report yet other than um gans can't count is, is one uh you know preliminary experimental finding but yeah. It's interesting.

B

Thanks other questions.

B

uh Dan go ahead.

C

Hi thanks for really interesting talk. I guess I was wondering if you could um discuss a bit more of the idea of kind of borrowing um in space to substitute for short records in time. I guess I have this notion of kind of like trying to do a skin graft, maybe of one part of the world like onto another, but it seems like you'd have to you know pick which places you could use to substitute yeah. I don't know if you could just talk about that. A little bit.

A

Yeah, I guess um right so um there are these canonical examples in ai, where you can have training data that learns a perfect classification between an actual wolf versus a husky dog.

A

But then, if you look at the interpretation of what part of the images you were looking at, they were just looking at the backgrounds and the wolves were only pictured in snow in the training data set and the huskies were pictured on grass, for example, um so data diversity over your whole feature space is just critical for machine learning, and so my point is that for um for for most things, you know our only our data only goes back a few years in time.

A

Now, if you're doing something like real time, power, output, forecasting of solar or wind- where you want to predict on minutes down to seconds, then actually you have plenty of of data, but um for a lot of you know, climate-based things where you care about monthly or decadal or annual. We really have a very limited time scale. So I guess what I was more thinking is: um we need to add diversity and granularity in our data to get robust models.

A

Learned using machine learning and so could we use the diversity and very high granularity that we're starting to have over space. So, for example, say you were trying to do a causal study of some intervention that you might want to do for climate change, mitigation or adaptation.

A

um There may have been, you know, different laws or different activities in different regions, and so now you can get time series data. You can kind of simulate the counter factual via diversity over space. Of course, you'd have to control for all the other confounds um about how these geographic locations differ.

A

But when you see the granularity, especially that some of these companies or our government has there's an extreme high spatial granularity and what is that good for maybe we can somehow repurpose it to get more data, diversity and granularity that will sort of make up for the lack that we have over.

B

Time, okay, great um jerry, go ahead; uh yeah hi claire! uh That was really interesting. Talk, um I'm gonna! Ask you a very naive question. um You see the supervised learning part because you've got a course grid uh that you're putting in, and you know, kind of what you have information about the time the space scales you're getting out, but the unsupervised part you're giving it this say. One degree data that presumably doesn't know anything about.

B

What's going on on the scale of convective organization, for example, so how what's it picking up on in the one degree, data that would actually produce uh convective organization at you know like at the 10 kilometer uh level, I mean it it. uh It seems like you're getting. It seems like it's kind of magic, and it must be picking up on something that it's that the input in the input data that tells it that there's something at the sub grid scale that it's going to give you in the output.

A

The the only quote, unquote magic is that the images uh so the predictions um of, or whatever the the data output by era versus the data output by worf at at the 1 degree resolution or the 1 8 degree resolution they're registered to the same geographical bounding box.

A

And then we're training on many many many instances registered to the same bounding box. There's no, I hate to say it, but there's no deeper intelligence about convective processes.

A

It's literally just there should be some shared geographical structure because they are registered to be aligned on the same bounding box and then you're, seeing many instances at low and many instant, at course, resolution and many instances at high resolution. But I mean that I think the frustrating part uh in terms of interpretability of many of these deep methods is that we can't explicitly point to or even believe that anything was learned really about the.

B

Physics, thanks.

A

um There is there are people bringing physics in to um constrain deep learning in various ways. So when I wrote down that loss function, you could have in your loss function some um encoding of a physical law, for example.

A

We would call that a regularizer, um so I saw anish subermanian is on the call he's been involved with that kind of work. I believe dj gagne has been involved with that kind of work as well.

B

Thanks claire yeah, we still have time for questions so feel free to folks feel free to jump in. I guess maybe I'll ask uh kind of maybe coming off of jerry's question.

B

The I think will be interesting for folks to hear about more is the physical interpretation of the latent space, and I know this is something that you and I have talked about. I think, but what I hadn't seen was uh the the part about taking a walk through the latent space and extrapolating through time. So I found that to be an interesting aspect of potentially applying how you might use the information from this latent space to inform you know other extrapolations.

B

So is there anything else you wanted to add on that on that sort of emerging area.

A

Sure it's definitely um I mean this is another instance where the interpretability that would be useful in climate and meteorology, it would would be useful. You know in ai in general um and so interpretability and you know how to traverse the latent space are really bleeding edge topics. Right now, in in straight ai, core ai, um there I've seen ai papers coming out, saying that you know the meaningful.

A

So now you've learned your posterior, you started with some silly prior or very um you, um either uniform or isotropic gaussian prior then you've done your normalizing flow, so you've gotten a much more informative distribution over your latent space.

A

um How can you interpret what's there or how could you traverse it so that an interpolation that you make between two known data points um has the correct interpretation um and so papers are coming out? That's saying that you should follow a geodesic path along in your probability, um so it's kind of like you know, you're you're, all very familiar with geodesic paths on the earth or on a topographical map. Now the the topographical map is, you know our level sets of probability in the posterior distribution that we've learned, um and so what happens?

A

Is that if you would like to try to compute the geodesic next step, it's computationally heavy.

A

So there are these approximation techniques where you'll make maybe a linear approximation locally, and then um you know snap back to a geodesic, so those sorts of iterative techniques that are trying to weigh against the faithfulness of the path to the geodesic of the learned posterior as opposed to computation time um but yeah. I have had people scientists challenge me and say you know. You can't really say that that's a temporal inter interpolation and right like that again gets into interpretability.

A

um All I meant was that we know these simulations um were representing precip at fine grained resolution at two different time points by our nwp right, and so, um if we find a path connecting them in latent space, I mean subject to the quality of the path right. Ideally, it's geodesic on the learn distribution.

A

We just could view this as an interpolation over time.

A

I'm not sure in terms of actually visualizing the latent space. I would also point people to work by m.a, ebert, upoff and libby barnes at csu, um where I think not this agu, but the previous one.

A

They they have done stuff around um visualizing latent spaces. I believe, um and they're really trying to kind of like translate these models and make them more interpretive. Interpretable in domain.

B

Great thanks dan go ahead.

C

Thanks yeah, I guess another question. I was really excited to see your description of the domain alignment problem, which I just hadn't heard described. That way, um I guess it's you know related. It could maybe maybe be related to a prediction problem where you have. Maybe you know a biased climate model in terms of making climate predictions and you're trying to look, maybe for a mapping to get from that model to something I guess my question is about the posedness of the problem of finding f.

C

um You know in that context where we have, I guess it comes down to again, like the brevity of the time series that we have, but I mean do you think that that's a good, a possible path forward for using flawed models to make predictions.

A

um I think it's interesting, I think the caveat- um and this is true even when you just try to make an ensemble of runs of a single climate model- is the sort of randomness and iid assumptions that you need to actually impose as the modeler right.

A

So if you had two random variables in nature x and y, like I mean we were hoping, that would be the precipitation field, for example at high and low resolution, then it's safe to say that you can get, um or maybe slightly more plausible, to say that you can get access to iid samples from their marginal.

A

What does that mean for a climate model? Right I mean, I don't know just it seems, there's a real chicken and egg problem there. When you want to do an ensemble. Do you want to perturb?

A

Just you know one parameter, um then what range should that parameter be in you've made already a decision and then what distribution over that range? Are you sampling from when you run your ensemble of runs, so I think it would be promising.

A

I think there could be some cool research there and I'd be really happy to chat more what I'm seeing as the hard, whether you want to call it a statistics, problem or philosophical problem is: how do you I mean where's, the random variable, I mean there's there were a lot of decisions and sort of engineering things behind. You know a climate model. I I think is not really a random variable. You might be able to argue otherwise or given that it might not be.

A

How could we sample from it or run it um randomly in different ways, because while you don't need paired data, you are relying on the iid assumption of access to the marginal of each of the two variables.

A

But that's a cool idea.

B

All right any last questions for claire I'll. Give it a moment here.

B

All right, I'm not seeing anything so I think we'll wrap up, but thanks so much claire for a great talk thanks. Everyone for the questions um so next week, we'll be back at our normal time. 11 a.m: mountain and we're going to have thiago bilo from scripps, giving the seminar next.